Skip to main content

Overview

The EDL pipeline can be configured using three boolean flags at the top of run_full_pipeline.py. These settings control data fetching behavior, optional datasets, and cleanup operations.

Configuration Flags

All configuration flags are located in run_full_pipeline.py at lines 61-71:
# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False

# Auto-delete intermediate files after pipeline succeeds
# Keeps: all_stocks_fundamental_analysis.json.gz + ohlcv_data/
CLEANUP_INTERMEDIATE = True

FETCH_OHLCV

FETCH_OHLCV
boolean
default:true
Controls whether to fetch historical OHLCV (Open, High, Low, Close, Volume) data for all stocks.
Behavior:
  • True: Fetches lifetime OHLCV data using smart incremental updates
    • First run: ~30 minutes (downloads full history from 1976)
    • Subsequent runs: ~2-5 minutes (only fetches new data)
    • Enables ADR, RVOL, ATH, and % from ATH calculations
  • False: Skips OHLCV fetching entirely
    • Pipeline runs ~4 minutes faster
    • Fields that depend on OHLCV will show 0 or null:
      • 5/14/20/30 Days MA ADR(%)
      • RVOL
      • % from ATH
      • Returns since Earnings(%)
When to disable:
  • Testing pipeline changes without needing price data
  • Running quick fundamental-only refreshes
  • Network bandwidth constraints
Files affected:
  • Creates/updates: ohlcv_data/{SYMBOL}.csv (one file per stock)
  • Creates/updates: indices_ohlcv_data/ directory for index data

FETCH_OPTIONAL

FETCH_OPTIONAL
boolean
default:false
Enables fetching of standalone datasets not included in the main pipeline output.
Behavior:
  • True: Runs PHASE 6 scripts to fetch:
    • All market indices (all_indices_list.json) - 194 indices
    • ETF data (etf_data_response.json) - 361 ETFs
  • False: Skips PHASE 6 entirely
What gets fetched:
ScriptOutput FileRecordsDescription
fetch_all_indices.pyall_indices_list.json194Nifty 50, Bank Nifty, sectoral indices
fetch_etf_data.pyetf_data_response.json361All exchange-traded funds
Note: These files are standalone and not merged into all_stocks_fundamental_analysis.json.gz. They’re used separately by the frontend for index tracking and ETF screening. When to enable:
  • You need fresh index composition data
  • Building ETF comparison features
  • Running a full data refresh for all asset classes

CLEANUP_INTERMEDIATE

CLEANUP_INTERMEDIATE
boolean
default:true
Auto-deletes intermediate files after successful pipeline completion.
Behavior:
  • True: Removes all intermediate files and directories after compression
    • Keeps only: *.json.gz files + ohlcv_data/ + indices_ohlcv_data/
    • Frees ~150-200 MB of disk space
  • False: Preserves all intermediate files for debugging
Files deleted when enabled:
INTERMEDIATE_FILES = [
    "master_isin_map.json",
    "dhan_data_response.json",
    "fundamental_data.json",
    "advanced_indicator_data.json",
    "all_company_announcements.json",
    "upcoming_corporate_actions.json",
    "history_corporate_actions.json",
    "nse_asm_list.json",
    "nse_gsm_list.json",
    "bulk_block_deals.json",
    "upper_circuit_stocks.json",
    "lower_circuit_stocks.json",
    "incremental_price_bands.json",
    "complete_price_bands.json",
    "nse_equity_list.csv",
    "all_stocks_fundamental_analysis.json",  # Raw JSON (after .gz is made)
]

INTERMEDIATE_DIRS = [
    "company_filings/",
    "market_news/",
]
When to disable:
  • Debugging pipeline failures
  • Inspecting intermediate data quality
  • Running custom analysis on raw outputs
  • Developing new pipeline stages

Modifying Configuration

1

Open the pipeline runner

Navigate to the EDL Pipeline directory:
cd "DO NOT DELETE EDL PIPELINE"
2

Edit run_full_pipeline.py

Open the file in your editor:
nano run_full_pipeline.py
# or
vim run_full_pipeline.py
3

Update the flags (lines 64-71)

Modify the values according to your needs:
FETCH_OHLCV = True           # Set to False to skip OHLCV
FETCH_OPTIONAL = True        # Set to True to fetch indices & ETFs
CLEANUP_INTERMEDIATE = False # Set to False to keep intermediate files
4

Save and run the pipeline

python3 run_full_pipeline.py

Common Configuration Scenarios

Quick Fundamental Refresh (No OHLCV)

FETCH_OHLCV = False
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = True
Runtime: ~4 minutes
Use case: Testing, quick fundamental updates

Full Production Refresh

FETCH_OHLCV = True
FETCH_OPTIONAL = True
CLEANUP_INTERMEDIATE = True
Runtime: ~35 minutes (first run), ~8 minutes (incremental)
Use case: Daily automated refresh, complete data update

Development/Debugging Mode

FETCH_OHLCV = True
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = False
Runtime: ~30 minutes (first run), ~6 minutes (incremental)
Use case: Inspecting intermediate outputs, debugging pipeline stages

Impact on Output Fields

When FETCH_OHLCV = False, the following fields in all_stocks_fundamental_analysis.json.gz will be 0 or null:
FieldDefault Value (No OHLCV)
5 Days MA ADR(%)0
14 Days MA ADR(%)0
20 Days MA ADR(%)0
30 Days MA ADR(%)0
RVOL0
% from ATH0
Returns since Earnings(%)0
Max Returns since Earnings(%)0
All other fundamental, technical indicator, and news fields remain unaffected.